Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
1.
Patterns (N Y) ; 5(1): 100906, 2024 Jan 12.
Article in English | MEDLINE | ID: mdl-38264714

ABSTRACT

Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.

2.
J Pers Med ; 12(8)2022 Aug 17.
Article in English | MEDLINE | ID: mdl-36013271

ABSTRACT

The Mass General Brigham Biobank (formerly Partners HealthCare Biobank) is a large repository of biospecimens and data linked to extensive electronic health record data and survey data. Its objective is to support and enable translational research focused on genomic, environmental, biomarker and family history associations with disease phenotypes. The Biobank has enrolled more than 135,000 participants, generated genomic data on more than 65,000 of its participants, distributed approximately 153,000 biospecimens, and served close to 450 institutional studies with biospecimens or data. Although the Biobank has been successful, based on some measures of output, this has required substantial institutional investment. In addition, several challenges are ongoing, including: (1) developing a sustainable cost model that doesn't rely as heavily on institutional funding; (2) integrating Biobank operations into clinical workflows; and (3) building a research resource that is diverse and promotes equity in research. Here, we describe the evolution of the Biobank and highlight key lessons learned that may inform other efforts to build biobanking efforts in health system contexts.

3.
BMC Med Inform Decis Mak ; 22(1): 23, 2022 01 28.
Article in English | MEDLINE | ID: mdl-35090449

ABSTRACT

INTRODUCTION: Currently, one of the commonly used methods for disseminating electronic health record (EHR)-based phenotype algorithms is providing a narrative description of the algorithm logic, often accompanied by flowcharts. A challenge with this mode of dissemination is the potential for under-specification in the algorithm definition, which leads to ambiguity and vagueness. METHODS: This study examines incidents of under-specification that occurred during the implementation of 34 narrative phenotyping algorithms in the electronic Medical Record and Genomics (eMERGE) network. We reviewed the online communication history between algorithm developers and implementers within the Phenotype Knowledge Base (PheKB) platform, where questions could be raised and answered regarding the intended implementation of a phenotype algorithm. RESULTS: We developed a taxonomy of under-specification categories via an iterative review process between two groups of annotators. Under-specifications that lead to ambiguity and vagueness were consistently found across narrative phenotype algorithms developed by all involved eMERGE sites. DISCUSSION AND CONCLUSION: Our findings highlight that under-specification is an impediment to the accuracy and efficiency of the implementation of current narrative phenotyping algorithms, and we propose approaches for mitigating these issues and improved methods for disseminating EHR phenotyping algorithms.


Subject(s)
Algorithms , Electronic Health Records , Genomics , Humans , Knowledge Bases , Phenotype
4.
J Stroke Cerebrovasc Dis ; 31(3): 106268, 2022 Mar.
Article in English | MEDLINE | ID: mdl-34974241

ABSTRACT

OBJECTIVES: The pathogenesis of intracranial aneurysms is multifactorial and includes genetic, environmental, and anatomic influences. We aimed to identify image-based morphological parameters that were associated with middle cerebral artery (MCA) bifurcation aneurysms. MATERIALS AND METHODS: We evaluated three-dimensional morphological parameters obtained from CT angiography (CTA) or digital subtraction angiography (DSA) from 317 patients with unilateral MCA bifurcation aneurysms diagnosed at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016. We chose the contralateral unaffected MCA bifurcation as the control group, in order to control for genetic and environmental risk factors. Diameters and angles of surrounding parent and daughter vessels of 634 MCAs were examined. RESULTS: Univariable and multivariable statistical analyses were performed to determine statistical significance. Sensitivity analyses with smaller (≤ 3 mm) aneurysms only and with angles excluded, were also performed. In a multivariable conditional logistic regression model we showed that smaller diameter size ratio (OR 0.0004, 95% CI 0.0001-0.15), larger daughter-daughter angles (OR 1.08, 95% CI 1.06-1.11) and larger parent-daughter angle ratios (OR 4.24, 95% CI 1.77-10.16) were significantly associated with MCA aneurysm presence after correcting for other variables. In order to account for possible changes to the vasculature by the aneurysm, a subgroup analysis of small aneurysms (≤ 3 mm) was performed and showed that the results were similar. CONCLUSIONS: Easily measurable morphological parameters of the surrounding vasculature of the MCA may provide objective metrics to assess MCA aneurysm formation risk in high-risk patients.


Subject(s)
Intracranial Aneurysm , Middle Cerebral Artery , Case-Control Studies , Computed Tomography Angiography , Female , Humans , Intracranial Aneurysm/diagnostic imaging , Middle Cerebral Artery/diagnostic imaging
5.
JAMIA Open ; 4(2): ooab036, 2021 Apr.
Article in English | MEDLINE | ID: mdl-34113801

ABSTRACT

Clinical data networks that leverage large volumes of data in electronic health records (EHRs) are significant resources for research on coronavirus disease 2019 (COVID-19). Data harmonization is a key challenge in seamless use of multisite EHRs for COVID-19 research. We developed a COVID-19 application ontology in the national Accrual to Clinical Trials (ACT) network that enables harmonization of data elements that are critical to COVID-19 research. The ontology contains over 50 000 concepts in the domains of diagnosis, procedures, medications, and laboratory tests. In particular, it has computational phenotypes to characterize the course of illness and outcomes, derived terms, and harmonized value sets for severe acute respiratory syndrome coronavirus 2 laboratory tests. The ontology was deployed and validated on the ACT COVID-19 network that consists of 9 academic health centers with data on 14.5M patients. This ontology, which is freely available to the entire research community on GitHub at https://github.com/shyamvis/ACT-COVID-Ontology, will be useful for harmonizing EHRs for COVID-19 research beyond the ACT network.

6.
NPJ Digit Med ; 4(1): 70, 2021 Apr 13.
Article in English | MEDLINE | ID: mdl-33850243

ABSTRACT

Chronic Kidney Disease (CKD) represents a slowly progressive disorder that is typically silent until late stages, but early intervention can significantly delay its progression. We designed a portable and scalable electronic CKD phenotype to facilitate early disease recognition and empower large-scale observational and genetic studies of kidney traits. The algorithm uses a combination of rule-based and machine-learning methods to automatically place patients on the staging grid of albuminuria by glomerular filtration rate ("A-by-G" grid). We manually validated the algorithm by 451 chart reviews across three medical systems, demonstrating overall positive predictive value of 95% for CKD cases and 97% for healthy controls. Independent case-control validation using 2350 patient records demonstrated diagnostic specificity of 97% and sensitivity of 87%. Application of the phenotype to 1.3 million patients demonstrated that over 80% of CKD cases are undetected using ICD codes alone. We also demonstrated several large-scale applications of the phenotype, including identifying stage-specific kidney disease comorbidities, in silico estimation of kidney trait heritability in thousands of pedigrees reconstructed from medical records, and biobank-based multicenter genome-wide and phenome-wide association studies.

7.
Sci Rep ; 11(1): 4791, 2021 02 26.
Article in English | MEDLINE | ID: mdl-33637879

ABSTRACT

We present a cohort of patients with anterior communicating artery (ACoA) aneurysms to investigate morphological characteristics and clinical factors associated with rupture of the aneurysms. 505 patients with ACoA aneurysms were identified at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016, with available CT angiography (CTA). Three-dimensional (3D) reconstructions were performed to evaluate aneurysmal morphologic features, including location, projection, irregularity, the presence of daughter dome, height, height/width ratio, and relationships between surrounding vessels. Patient risk factors assessed included patient age, sex, tobacco use, alcohol use, and family history of aneurysms and aneurysmal subarachnoid hemorrhage. Logistic regression was used to build a predictive ACoA score for rupture. Morphologic features associated with ruptured ACoA aneurysms were the presence of a daughter dome (OR 21.4, 95% CI 10.6-43.1), smaller neck diameter (OR 0.55, 95% CI 0.42-0.71), larger aspect ratio (OR 3.57, 95% CI 2.05-6.24), larger flow angle (OR 1.03, 95% CI 1.02-1.05), and smaller ipsilateral A2-ACoA angle (OR 0.98, 95% CI 0.97-1.00). Tobacco use was predominantly associated with morphological factors intrinsic to the aneurysm that were associated with rupture while younger age was also associated with morphologic features extrinsic to the aneurysm that were associated with rupture. The ACoA score had good predictive capacity for rupture with AUC = 0.92 using the 0.632 bootstrap cross-validation for correction of overfitting bias. Ruptured ACoA aneurysms were associated with morphological features that are simple to assess using a simple scoring system. Tobacco use and younger age were predominantly associated with intrinsic and extrinsic morphological features characteristic of rupture, respectively.


Subject(s)
Aneurysm, Ruptured/epidemiology , Anterior Cerebral Artery/pathology , Intracranial Aneurysm/epidemiology , Tobacco Use/epidemiology , Adult , Age Factors , Aged , Aneurysm, Ruptured/pathology , Female , Humans , Intracranial Aneurysm/pathology , Male , Middle Aged , Risk Factors
8.
Sci Rep ; 11(1): 2526, 2021 01 28.
Article in English | MEDLINE | ID: mdl-33510194

ABSTRACT

Morphological factors of intracranial aneurysms and the surrounding vasculature could affect aneurysm rupture risk in a location specific manner. Our goal was to identify image-based morphological parameters that correlated with ruptured basilar tip aneurysms. Three-dimensional morphological parameters obtained from CT-angiography (CTA) or digital subtraction angiography (DSA) from 200 patients with basilar tip aneurysms diagnosed at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016 were evaluated. We examined aneurysm wall irregularity, the presence of daughter domes, hypoplastic, aplastic or fetal PCoAs, vertebral dominance, maximum height, perpendicular height, width, neck diameter, aspect and size ratio, height/width ratio, and diameters and angles of surrounding parent and daughter vessels. Univariable and multivariable statistical analyses were performed to determine statistical significance. In multivariable analysis, presence of a daughter dome, aspect ratio, and larger flow angle were significantly associated with rupture status. We also introduced two new variables, diameter size ratio and parent-daughter angle ratio, which were both significantly inversely associated with ruptured basilar tip aneurysms. Notably, multivariable analyses also showed that larger diameter size ratio was associated with higher Hunt-Hess score while smaller flow angle was associated with higher Fisher grade. These easily measurable parameters, including a new parameter that is unlikely to be affected by the formation of the aneurysm, could aid in screening strategies in high-risk patients with basilar tip aneurysms. One should note, however, that the changes in parameters related to aneurysm morphology may be secondary to aneurysm rupture rather than causal.


Subject(s)
Aneurysm, Ruptured/diagnostic imaging , Aneurysm, Ruptured/pathology , Basilar Artery/diagnostic imaging , Basilar Artery/pathology , Intracranial Aneurysm/diagnostic imaging , Intracranial Aneurysm/pathology , Aged , Aneurysm, Ruptured/etiology , Cerebral Angiography , Computed Tomography Angiography , Female , Humans , Image Processing, Computer-Assisted , Imaging, Three-Dimensional , Male , Middle Aged , Risk Factors
9.
World Neurosurg ; 146: e1318-e1325, 2021 02.
Article in English | MEDLINE | ID: mdl-33307259

ABSTRACT

OBJECTIVE: To identify clinical and morphologic risk factors correlated with anterior communicating artery (ACoA) aneurysm formation. METHODS: Three-dimensional morphologic parameters obtained from computed tomography angiography or digital subtraction angiography from 504 patients with ACoA aneurysms and 201 patients with aneurysms in other locations that were diagnosed at Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016 were evaluated. The presence of hypoplastic and aplastic A1 segments and diameters and angles of surrounding parent and daughter vessels were examined. Univariable and multivariable statistical analyses were performed to determine statistical significance. Sensitivity analyses for small (≤3 mm) aneurysms only were also performed. RESULTS: Aplastic and hypoplastic A1 segments were more common in the ACoA group (38.9% vs. 6.5% hypoplastic and 22.2% vs. 0.5% aplastic). In multivariable analysis, the presence of a hypoplastic A1 segment was associated with ACoA aneurysms. An A2-ACoA (daughter-daughter) angle was also significantly associated with ACoA aneurysms in multivariable analysis; however, as Pearson's correlation test between aneurysm width and daughter-daughter angle was significant, the daughter-daughter angle was most likely not independently associated with aneurysm presence, but rather might have been a result of the presence of an aneurysm. Subgroup analyses of small aneurysms (≤3 mm) and of unruptured aneurysms showed similar results. CONCLUSIONS: Our results demonstrate that of all the morphologic parameters, the presence of a hypoplastic A1 segment was the only parameter independently associated with the presence of ACoA aneurysms that was not correlated with aneurysm size and could aid as a simple screening parameter.


Subject(s)
Aneurysm, Ruptured/diagnostic imaging , Anterior Cerebral Artery/diagnostic imaging , Circle of Willis/diagnostic imaging , Intracranial Aneurysm/diagnostic imaging , Adult , Aged , Anterior Cerebral Artery/pathology , Case-Control Studies , Cerebral Angiography , Circle of Willis/pathology , Computed Tomography Angiography , Female , Humans , Imaging, Three-Dimensional , Male , Middle Aged , Organ Size
10.
Sci Rep ; 10(1): 17928, 2020 10 21.
Article in English | MEDLINE | ID: mdl-33087795

ABSTRACT

Hemodynamic stress is thought to play an important role in the formation of intracranial aneurysms, which is conditioned by the geometry of the surrounding vasculature. Our goal was to identify image-based morphological parameters that were associated with basilar artery tip aneurysms (BTA) in a location-specific manner. Three-dimensional morphological parameters obtained from CT-angiography (CTA) or digital subtraction angiography (DSA) from 207 patients with BTAs and a control group of 106 patients with aneurysms elsewhere to control for non-morphological factors, who were diagnosed at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016, were evaluated. We examined the presence of hypoplastic, aplastic or fetal PCoAs, vertebral dominance, and diameters and angles of surrounding parent and daughter vessels. Univariable and multivariable statistical analyses were performed to determine statistical significance. Sensitivity analyses with small (≤ 3 mm) aneurysms only and with angles excluded, were also performed. In multivariable analysis, daughter-daughter angle was directly, and parent artery diameter and diameter size ratio were inversely associated with BTAs. These results remained significant in the subgroup analysis of small aneurysms (width ≤ 3 mm) and when angles were excluded. These easily measurable and robust parameters that are unlikely to be affected by aneurysm formation could aid in risk stratification for the formation of BTAs in high-risk patients.


Subject(s)
Basilar Artery/pathology , Intracranial Aneurysm/etiology , Intracranial Aneurysm/pathology , Adult , Aged , Angiography, Digital Subtraction , Basilar Artery/diagnostic imaging , Computed Tomography Angiography , Female , Humans , Intracranial Aneurysm/diagnostic imaging , Male , Middle Aged , Risk
11.
Sci Rep ; 10(1): 11545, 2020 07 14.
Article in English | MEDLINE | ID: mdl-32665589

ABSTRACT

Risk of intracranial aneurysm rupture could be affected by geometric features of intracranial aneurysms and the surrounding vasculature in a location specific manner. Our goal is to investigate the morphological characteristics associated with ruptured posterior communicating artery (PCoA) aneurysms, as well as patient factors associated with the morphological parameters. Three-dimensional morphological parameters in 409 patients with 432 PCoA aneurysms diagnosed at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016 who had available CT angiography (CTA) or digital subtraction angiography (DSA) were evaluated. Morphological parameters examined included aneurysm wall irregularity, presence of a daughter dome, presence of hypoplastic or aplastic A1 arteries and hypoplastic or fetal PCoA, perpendicular height, width, neck diameter, aspect and size ratio, height/width ratio, and diameters and angles of surrounding parent and daughter vessels. Univariable and multivariable statistical analyses were performed to determine the association of morphological parameters with rupture of PCoA aneurysms. Additional analyses were performed to determine the association of patient factors with the morphological parameters. Irregular, multilobed PCoA aneurysms with larger height/width ratios and larger flow angles were associated with ruptured PCoA aneurysms, whereas perpendicular height was inversely associated with rupture in a multivariable model. Older age was associated with lower aspect ratio, with a trend towards lower height/width ratio and smaller flow angle, features that are associated with a lower rupture risk. Morphological parameters are easy to assess and could help in risk stratification in patients with unruptured PCoA aneurysms. PCoA aneurysms diagnosed at older age have morphological features associated with lower risk.


Subject(s)
Aneurysm, Ruptured/physiopathology , Intracranial Aneurysm/physiopathology , Age Factors , Aged , Aneurysm, Ruptured/diagnostic imaging , Cerebral Angiography , Computed Tomography Angiography , Female , Humans , Image Processing, Computer-Assisted , Imaging, Three-Dimensional , Intracranial Aneurysm/diagnostic imaging , Male , Middle Aged , Multivariate Analysis , Natural Language Processing , Registries , Retrospective Studies , Risk
12.
Clin Gastroenterol Hepatol ; 18(8): 1890-1892, 2020 07.
Article in English | MEDLINE | ID: mdl-31404664

ABSTRACT

Crohn's disease (CD) and ulcerative colitis (UC) are heterogeneous. With availability of therapeutic classes with distinct immunologic mechanisms of action, it has become imperative to identify markers that predict likelihood of response to each drug class. However, robust development of such tools has been challenging because of need for large prospective cohorts with systematic and careful assessment of treatment response using validated indices. Most hospitals in the United States use electronic health records (EHRs) that warehouse a large amount of narrative (free-text) and codified (administrative) data generated during routine clinical care. These data have been used to construct virtual disease cohorts for epidemiologic research as well as for defining genetic basis of disease states or discrete laboratory values.1-3 Whether EHR-based data can be used to validate genetic associations for more nuanced outcomes such as treatment response has not been examined previously.


Subject(s)
Colitis, Ulcerative , Crohn Disease , Inflammatory Bowel Diseases , Electronic Health Records , Humans , Inflammatory Bowel Diseases/drug therapy , Prospective Studies , United States
13.
J Biomed Inform ; 99: 103293, 2019 11.
Article in English | MEDLINE | ID: mdl-31542521

ABSTRACT

BACKGROUND: Implementation of phenotype algorithms requires phenotype engineers to interpret human-readable algorithms and translate the description (text and flowcharts) into computable phenotypes - a process that can be labor intensive and error prone. To address the critical need for reducing the implementation efforts, it is important to develop portable algorithms. METHODS: We conducted a retrospective analysis of phenotype algorithms developed in the Electronic Medical Records and Genomics (eMERGE) network and identified common customization tasks required for implementation. A novel scoring system was developed to quantify portability from three aspects: Knowledge conversion, clause Interpretation, and Programming (KIP). Tasks were grouped into twenty representative categories. Experienced phenotype engineers were asked to estimate the average time spent on each category and evaluate time saving enabled by a common data model (CDM), specifically the Observational Medical Outcomes Partnership (OMOP) model, for each category. RESULTS: A total of 485 distinct clauses (phenotype criteria) were identified from 55 phenotype algorithms, corresponding to 1153 customization tasks. In addition to 25 non-phenotype-specific tasks, 46 tasks are related to interpretation, 613 tasks are related to knowledge conversion, and 469 tasks are related to programming. A score between 0 and 2 (0 for easy, 1 for moderate, and 2 for difficult portability) is assigned for each aspect, yielding a total KIP score range of 0 to 6. The average clause-wise KIP score to reflect portability is 1.37 ±â€¯1.38. Specifically, the average knowledge (K) score is 0.64 ±â€¯0.66, interpretation (I) score is 0.33 ±â€¯0.55, and programming (P) score is 0.40 ±â€¯0.64. 5% of the categories can be completed within one hour (median). 70% of the categories take from days to months to complete. The OMOP model can assist with vocabulary mapping tasks. CONCLUSION: This study presents firsthand knowledge of the substantial implementation efforts in phenotyping and introduces a novel metric (KIP) to measure portability of phenotype algorithms for quantifying such efforts across the eMERGE Network. Phenotype developers are encouraged to analyze and optimize the portability in regards to knowledge, interpretation and programming. CDMs can be used to improve the portability for some 'knowledge-oriented' tasks.


Subject(s)
Electronic Health Records/classification , Medical Informatics/methods , Algorithms , Genomics , Humans , Phenotype , Retrospective Studies
14.
J Biomed Inform ; 96: 103253, 2019 08.
Article in English | MEDLINE | ID: mdl-31325501

ABSTRACT

BACKGROUND: Implementing clinical phenotypes across a network is labor intensive and potentially error prone. Use of a common data model may facilitate the process. METHODS: Electronic Medical Records and Genomics (eMERGE) sites implemented the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model across their electronic health record (EHR)-linked DNA biobanks. Two previously implemented eMERGE phenotypes were converted to OMOP and implemented across the network. RESULTS: It was feasible to implement the common data model across sites, with laboratory data producing the greatest challenge due to local encoding. Sites were then able to execute the OMOP phenotype in less than one day, as opposed to weeks of effort to manually implement an eMERGE phenotype in their bespoke research EHR databases. Of the sites that could compare the current OMOP phenotype implementation with the original eMERGE phenotype implementation, specific agreement ranged from 100% to 43%, with disagreements due to the original phenotype, the OMOP phenotype, changes in data, and issues in the databases. Using the OMOP query as a standard comparison revealed differences in the original implementations despite starting from the same definitions, code lists, flowcharts, and pseudocode. CONCLUSION: Using a common data model can dramatically speed phenotype implementation at the cost of having to populate that data model, though this will produce a net benefit as the number of phenotype implementations increases. Inconsistencies among the implementations of the original queries point to a potential benefit of using a common data model so that actual phenotype code and logic can be shared, mitigating human error in reinterpretation of a narrative phenotype definition.


Subject(s)
Attention Deficit Disorder with Hyperactivity/diagnosis , Databases, Factual , Diabetes Mellitus, Type 2/diagnosis , Electronic Health Records , Data Collection , Humans , Medical Informatics , National Human Genome Research Institute (U.S.) , Observational Studies as Topic , Outcome Assessment, Health Care , Phenotype , Research Design , Software , United States
15.
J Am Med Inform Assoc ; 25(11): 1540-1546, 2018 11 01.
Article in English | MEDLINE | ID: mdl-30124903

ABSTRACT

Electronic health record (EHR) algorithms for defining patient cohorts are commonly shared as free-text descriptions that require human intervention both to interpret and implement. We developed the Phenotype Execution and Modeling Architecture (PhEMA, http://projectphema.org) to author and execute standardized computable phenotype algorithms. With PhEMA, we converted an algorithm for benign prostatic hyperplasia, developed for the electronic Medical Records and Genomics network (eMERGE), into a standards-based computable format. Eight sites (7 within eMERGE) received the computable algorithm, and 6 successfully executed it against local data warehouses and/or i2b2 instances. Blinded random chart review of cases selected by the computable algorithm shows PPV ≥90%, and 3 out of 5 sites had >90% overlap of selected cases when comparing the computable algorithm to their original eMERGE implementation. This case study demonstrates potential use of PhEMA computable representations to automate phenotyping across different EHR systems, but also highlights some ongoing challenges.


Subject(s)
Algorithms , Electronic Health Records , Phenotype , Prostatic Hyperplasia/diagnosis , Data Warehousing , Databases, Factual , Genomics , Humans , Male , Organizational Case Studies , Prostatic Hyperplasia/genetics
16.
Dig Dis Sci ; 63(7): 1794-1800, 2018 07.
Article in English | MEDLINE | ID: mdl-29696479

ABSTRACT

BACKGROUND: ADR is a widely used colonoscopy quality indicator. Calculation of ADR is labor-intensive and cumbersome using current electronic medical databases. Natural language processing (NLP) is a method used to extract meaning from unstructured or free text data. AIMS: (1) To develop and validate an accurate automated process for calculation of adenoma detection rate (ADR) and serrated polyp detection rate (SDR) on data stored in widely used electronic health record systems, specifically Epic electronic health record system, Provation® endoscopy reporting system, and Sunquest PowerPath pathology reporting system. METHODS: Screening colonoscopies performed between June 2010 and August 2015 were identified using the Provation® reporting tool. An NLP pipeline was developed to identify adenomas and sessile serrated polyps (SSPs) on pathology reports corresponding to these colonoscopy reports. The pipeline was validated using a manual search. Precision, recall, and effectiveness of the natural language processing pipeline were calculated. ADR and SDR were then calculated. RESULTS: We identified 8032 screening colonoscopies that were linked to 3821 pathology reports (47.6%). The NLP pipeline had an accuracy of 100% for adenomas and 100% for SSPs. Mean total ADR was 29.3% (range 14.7-53.3%); mean male ADR was 35.7% (range 19.7-62.9%); and mean female ADR was 24.9% (range 9.1-51.0%). Mean total SDR was 4.0% (0-9.6%). CONCLUSIONS: We developed and validated an NLP pipeline that accurately and automatically calculates ADRs and SDRs using data stored in Epic, Provation® and Sunquest PowerPath. This NLP pipeline can be used to evaluate colonoscopy quality parameters at both individual and practice levels.


Subject(s)
Adenocarcinoma/diagnosis , Adenomatous Polyps/diagnosis , Colonic Neoplasms/diagnosis , Colonic Polyps/diagnosis , Colonoscopy , Early Detection of Cancer/methods , Electronic Health Records , Natural Language Processing , Adenocarcinoma/pathology , Adenomatous Polyps/pathology , Automation , Colonic Neoplasms/pathology , Colonic Polyps/pathology , Colonoscopy/standards , Early Detection of Cancer/standards , Female , Humans , Male , Predictive Value of Tests , Quality Indicators, Health Care , Reproducibility of Results
17.
Stroke ; 49(1): 34-39, 2018 01.
Article in English | MEDLINE | ID: mdl-29203688

ABSTRACT

BACKGROUND AND PURPOSE: Previous studies have suggested a protective effect of diabetes mellitus on aneurysmal subarachnoid hemorrhage risk. However, reports are inconsistent, and objective measures of hyperglycemia in these studies are lacking. Our aim was to investigate the association between aneurysmal subarachnoid hemorrhage and antihyperglycemic agent use and glycated hemoglobin levels. METHODS: The medical records of 4701 patients with 6411 intracranial aneurysms, including 1201 prospective patients, diagnosed at the Massachusetts General Hospital and Brigham and Women's Hospital between 1990 and 2016 were reviewed and analyzed. Patients were separated into ruptured and nonruptured groups. Univariate and multivariate logistic regression analyses were performed to determine the association between aneurysmal subarachnoid hemorrhage and antihyperglycemic agents and glycated hemoglobin levels. Propensity score weighting was used to account for selection bias. RESULTS: In both unweighted and weighted multivariate analysis, antihyperglycemic agent use was inversely and significantly associated with ruptured aneurysms (unweighted odds ratio, 0.58; 95% confidence interval, 0.39-0.87; weighted odds ratio, 0.57; 95% confidence interval, 0.34-0.96). In contrast, glycated hemoglobin levels were not significantly associated with rupture status. CONCLUSIONS: Antihyperglycemic agent use rather than hyperglycemia is associated with decreased risk of aneurysmal subarachnoid hemorrhage, suggesting a possible protective effect of glucose-lowering agents in the pathogenesis of aneurysm rupture.


Subject(s)
Aneurysm, Ruptured , Glycated Hemoglobin/metabolism , Hypoglycemic Agents/administration & dosage , Intracranial Aneurysm , Subarachnoid Hemorrhage , Adult , Aged , Aneurysm, Ruptured/blood , Aneurysm, Ruptured/epidemiology , Aneurysm, Ruptured/etiology , Aneurysm, Ruptured/physiopathology , Female , Humans , Hypoglycemic Agents/adverse effects , Intracranial Aneurysm/blood , Intracranial Aneurysm/epidemiology , Intracranial Aneurysm/etiology , Intracranial Aneurysm/physiopathology , Male , Middle Aged , Risk Factors , Subarachnoid Hemorrhage/blood , Subarachnoid Hemorrhage/epidemiology , Subarachnoid Hemorrhage/etiology , Subarachnoid Hemorrhage/physiopathology
18.
J Am Med Inform Assoc ; 25(1): 54-60, 2018 01 01.
Article in English | MEDLINE | ID: mdl-29126253

ABSTRACT

Objective: Electronic health record (EHR)-based phenotyping infers whether a patient has a disease based on the information in his or her EHR. A human-annotated training set with gold-standard disease status labels is usually required to build an algorithm for phenotyping based on a set of predictive features. The time intensiveness of annotation and feature curation severely limits the ability to achieve high-throughput phenotyping. While previous studies have successfully automated feature curation, annotation remains a major bottleneck. In this paper, we present PheNorm, a phenotyping algorithm that does not require expert-labeled samples for training. Methods: The most predictive features, such as the number of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes or mentions of the target phenotype, are normalized to resemble a normal mixture distribution with high area under the receiver operating curve (AUC) for prediction. The transformed features are then denoised and combined into a score for accurate disease classification. Results: We validated the accuracy of PheNorm with 4 phenotypes: coronary artery disease, rheumatoid arthritis, Crohn's disease, and ulcerative colitis. The AUCs of the PheNorm score reached 0.90, 0.94, 0.95, and 0.94 for the 4 phenotypes, respectively, which were comparable to the accuracy of supervised algorithms trained with sample sizes of 100-300, with no statistically significant difference. Conclusion: The accuracy of the PheNorm algorithms is on par with algorithms trained with annotated samples. PheNorm fully automates the generation of accurate phenotyping algorithms and demonstrates the capacity for EHR-driven annotations to scale to the next level - phenotypic big data.


Subject(s)
Algorithms , Big Data , Electronic Health Records , Phenotype , Area Under Curve , Datasets as Topic , Humans , Intercellular Signaling Peptides and Proteins , International Classification of Diseases , Peptides , Precision Medicine
19.
Arthritis Rheumatol ; 69(4): 742-749, 2017 04.
Article in English | MEDLINE | ID: mdl-27792870

ABSTRACT

OBJECTIVE: Patients with rheumatoid arthritis (RA) develop autoantibodies against a spectrum of antigens, but the clinical significance of these autoantibodies is unclear. Using a phenome-wide association study (PheWAS) approach, we examined the association between autoantibodies and clinical subphenotypes of RA. METHODS: This study was conducted in a cohort of RA patients identified from the electronic medical records (EMRs) of 2 tertiary care centers. Using a published multiplex bead assay, we measured 36 autoantibodies targeting epitopes implicated in RA. We extracted all International Classification of Diseases, Ninth Revision (ICD-9) codes for each subject and grouped them into disease categories (PheWAS codes), using a published method. We tested for the association of each autoantibody (grouped by the targeted protein) with PheWAS codes. To determine significant associations (at a false discovery rate [FDR] of ≤0.1), we reviewed the medical records of 50 patients with each PheWAS code to determine positive predictive values (PPVs). RESULTS: We studied 1,006 RA patients; the mean ± SD age of the patients was 61.0 ± 12.9 years, and 79.0% were female. A total of 3,568 unique ICD-9 codes were grouped into 625 PheWAS codes; the 206 PheWAS codes with a prevalence of ≥3% were studied. Using the PheWAS method, we identified 24 significant associations of autoantibodies to epitopes at an FDR of ≤0.1. The associations that were strongest and had the highest PPV for the PheWAS code were autoantibodies against fibronectin and obesity (P = 6.1 × 10-4 , PPV 100%), and that between fibrinogen and pneumonopathy (P = 2.7 × 10-4 , PPV 96%). Pneumonopathy codes included diagnoses for cryptogenic organizing pneumonia and obliterative bronchiolitis. CONCLUSION: We demonstrated application of a bioinformatics method, the PheWAS, to screen for the clinical significance of RA-related autoantibodies. Using the PheWAS approach, we identified potentially significant links between variations in the levels of autoantibodies and comorbidities of interest in RA.


Subject(s)
Arthritis, Rheumatoid/genetics , Arthritis, Rheumatoid/immunology , Autoantibodies/genetics , Epitopes , Peptides, Cyclic/immunology , Female , Genome-Wide Association Study , Humans , Male , Middle Aged , Phenotype
20.
J Am Med Inform Assoc ; 24(e1): e143-e149, 2017 Apr 01.
Article in English | MEDLINE | ID: mdl-27632993

ABSTRACT

OBJECTIVE: Phenotyping algorithms are capable of accurately identifying patients with specific phenotypes from within electronic medical records systems. However, developing phenotyping algorithms in a scalable way remains a challenge due to the extensive human resources required. This paper introduces a high-throughput unsupervised feature selection method, which improves the robustness and scalability of electronic medical record phenotyping without compromising its accuracy. METHODS: The proposed Surrogate-Assisted Feature Extraction (SAFE) method selects candidate features from a pool of comprehensive medical concepts found in publicly available knowledge sources. The target phenotype's International Classification of Diseases, Ninth Revision and natural language processing counts, acting as noisy surrogates to the gold-standard labels, are used to create silver-standard labels. Candidate features highly predictive of the silver-standard labels are selected as the final features. RESULTS: Algorithms were trained to identify patients with coronary artery disease, rheumatoid arthritis, Crohn's disease, and ulcerative colitis using various numbers of labels to compare the performance of features selected by SAFE, a previously published automated feature extraction for phenotyping procedure, and domain experts. The out-of-sample area under the receiver operating characteristic curve and F -score from SAFE algorithms were remarkably higher than those from the other two, especially at small label sizes. CONCLUSION: SAFE advances high-throughput phenotyping methods by automatically selecting a succinct set of informative features for algorithm training, which in turn reduces overfitting and the needed number of gold-standard labels. SAFE also potentially identifies important features missed by automated feature extraction for phenotyping or experts.


Subject(s)
Algorithms , Data Mining , Electronic Health Records , Humans , Machine Learning , Natural Language Processing , Phenotype
SELECTION OF CITATIONS
SEARCH DETAIL
...